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Abstract 

A  computational  framework  has  been  developed  to  carry  out  identity  management ,  that  is, 
the  automatic  inference  of  the  identities  of  targets  tracked  by  surveillance  systems  that  cover 
wide  areas  such  as  a  shopping  mall  or  a  large  harbor.  People  or  vehicles  may  remain  invisible 
to  the  system  for  long  periods  of  time  as  they  move  between  sensors.  Identity  management 
attempts  to  infer  from  uncertain  measurements  who  or  what  is  where  at  all  times. 

The  following  work  was  performed  in  this  short-term  project: 

•  Fleshed  out  and  streamlined  the  mathematical  framework  for  identity  management.  This 
required  significant  changes  at  the  core  of  the  framework,  and  several  of  the  ideas  built  on 
top  of  this  had  to  be  adapted  or  reinvented  as  well,  prompting  a  systematic  reformulation 
of  the  mathematics. 

•  Studied  and  tested  algorithms  from  the  literature  to  be  used,  either  directly  or  in  modified 
form,  in  the  core  inference  engine  of  an  identity  management  system. 

•  Developed  a  computationally  efficient  method  for  finding  high-likelihood  identity  as¬ 
signments  given  a  graph  of  association  probabilities  between  sensor  observations.  This 
method  efficiently  solves  the  batch  version  of  the  main  estimation  problem  underlying 
identity  management. 
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Illustration  Captions  and  Appendices 


Figure  1:  Algorithms  that  measure  the  cost  of  a  cut  by  the  sum  of  its  edge  weights  will  not 
remove  the  weak,  spurious  edges  e±,  e-2  between  components  A  and  B  of  this  graph  until  after 
“dangling”  connections  like  63,  e 4  arc  severed.  Page  6 

Figure  2:  Top  left:  An  observation  graph.  Blue  edges  correspond  to  strong  associations,  red 
edges  to  weak  ones.  Other  panels:  Partitions  in  the  complete  filtration  for  the  graph  at  top 
left.  The  ranges  of  7r  that  result  in  each  partition  arc  shown  above  each  partition.  The  boxed 
partition  is  the  most  persistent  one.  Page  9 

Appendix  A:  Bayesian  Filtering  Page  10 


1  Statement  of  the  Problem  Studied 

Surveillance  systems  often  cover  wide  areas  such  as  a  shopping  mall,  a  city’s  subway  system, 
a  large  harbor,  or  several  city  blocks  around  a  sensitive  installation.  People  or  vehicles  (hence¬ 
forth  targets)  arc  detected  by  sensors  (typically  cameras  or  camera/LIDAR  combinations)  in 
different  locations  and  at  different  points  in  time.  Targets  may  remain  invisible  for  several  sec¬ 
onds  or  minutes  as  they  move  through  areas  with  no  coverage.  Even  when  they  remain  in  the 
field  of  view  of,  say,  a  single  camera,  targets  may  be  occluded  by  (or  become  indistinguishable 
from)  other  targets,  or  behind  objects.  Even  when  a  target  is  distinctly  visible,  its  appearance 
may  vary  as  a  result  of  changes  in  lighting,  distance,  pose,  or  sensing  parameters. 

The  fundamental  challenge  then  arises  of  how  to  infer  an  accurate,  consistent  picture  of 
the  world  from  intermittent,  uncertain  observations.  This  problem  has  been  called  identity 
management  in  the  literature  [4],  and  has  been  the  main  focus  of  the  work  performed  under 
this  short-term  grant. 


2  Summary  of  the  Most  Important  Results 

The  mathematical  framework  sketched  out  in  the  grant  proposal  was  fleshed  out  and  stream¬ 
lined,  as  described  in  Section  2.1  below.  This  required  significant  changes  at  the  core  of  the 
framework,  and  several  of  the  ideas  built  on  top  of  this  had  to  be  adapted  or  reinvented  as  well, 
prompting  a  systematic  reformulation  of  the  mathematics.  The  most  promising  algorithms 
from  the  literature,  discussed  in  Section  2.2,  were  then  tested  empirically  with  simulated  data, 
as  shown  in  section  2.3.  Dissatisfaction  with  these  results  were  tied  to  the  so-called  addi¬ 
tivity  problem ,  described  in  Section  2.4.  This  difficulty  prompted  the  development  of  a  new 
formulation  of  the  batch  version  of  the  main  inference  problem  for  identity  management,  the 
partition  filtration  associated  with  a  graph  of  observations.  The  partition  filtration  is  introduced 
in  Section  2.5. 


2.1  The  Mathematical  Framework 

An  important  improvement  in  the  mathematical  framework  sketched  in  the  proposal  for  this 
grant  is  a  shift  in  the  basic  representation  of  observations  and  states.  In  the  old  version,  a  com¬ 
binatorial  structure  called  a  multipartite  partition  was  used  to  describe  deterministic  knowl¬ 
edge,  or  belief,  as  to  which  set  of  observations  relate  to  the  same  target.  Specifically,  the  set 
of  all  observations  can  be  partitioned  into  sets,  one  set  per  target.  This  partition  was  made 
multipartite,  in  the  proposal,  to  capture  the  notion  that  for  certain  pairs  of  observations  it  is 
possible  to  know  withe  certainty  that  they  refer  to  distinct  targets.  For  instance,  a  single  sensor 
is  assumed  to  be  able  to  produce  at  most  one  observation  of  a  target  at  any  point  in  time.  So 
two  observations  produced  simultaneously  by  the  same  sensor  are  assumed  to  relate  to  differ¬ 
ent  target.  Another  example  is  a  pair  of  observations  that,  although  taken  at  different  points  in 
time,  come  from  sensors  that  arc  so  distant  that  no  target  could  have  moved  from  one  sensor  to 
the  other  in  the  given  time  interval.  More  generally,  the  observations  made  by  a  set  of  sensors 
in  a  surveillance  network  can  be  split  into  K  sets  such  that  the  sightings  within  a  set  are  certain 
to  refer  to  distinct  identities.  This  split  makes  the  partitions  multipartite. 
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Initially  in  this  effort,  mathematics  and  simulations  were  all  carried  out  in  this  context  of 
multipartite  partions.  After  some  time,  however,  it  became  clear  that  constraining  the  partitions 
to  be  multipartite  led  to  more  difficulties  than  it  addressed.  The  main  reason  for  this  is  that, 
while  the  partition  aspect  of  a  multipartite  partition  captures  equality  -  as  in  “observation  a 
is  equal  to  observation  b”  -  ,  its  multipartite  aspect  captures  non-equality  -  “a  is  not  equal  to 
b.”  Since  the  notion  of  “not  equal”  is  not  transitive,  the  resulting  clash  produced  increasing 
complications  both  in  data  structure  and  inference  methods. 

The  solution  to  this  difficulty  turned  out  to  be  both  simple  and  effective.  Multipartition 
is  removed,  and  non-equality  is  captured  by  (i)  assigning  a  zero  probability  to  pairs  of  ob¬ 
servations  that  arc  known  with  certainty  to  refer  to  distinct  targets,  and  (ii)  clamping  these 
zero  values  -  that  is,  forcing  them  to  remain  unchanged  -  during  inference.  This  change  in 
the  representation  that  lives  at  the  very  core  of  the  framework  required  a  systematic  modifica¬ 
tion  of  everything  built  on  top  of  it,  including  changing  weighted  match  graphs  into  weighed 
association  graphs  -  the  former  requiring  multipartition,  the  latter  not. 

Rather  than  describing  in  detail  the  history  of  all  the  changes,  their  net  result,  that  is,  the 
new  framework,  is  briefly  outlined  next.  This  description  follows  closely  the  reasoning  in  a 
new  grant  proposal  submitted  to  ARO  for  a  full-fledged  investigation  of  identity  management. 

A  key  representational  decision  is  to  capture  the  information  contained  in  a  set  of  n  mea¬ 
surements  through  m  association  values  0  <  prj  <  1,  defined  as  the  probability  that  mea¬ 
surements  i  and  j  were  generated  by  the  same  target,  given  the  values  of  the  measurements. 

This  choice  contrasts  with  the  more  usual  approach  of  summarizing  measurements  through  n 
feature  vectors,  one  per  measurement,  and  then  defining  a  metric  in  their  space. 

Pairwise  associations  are  preferred  over  individual  feature  vectors  for  two  reasons.  First, 
sensors  used  in  different  parts  of  the  space  under  observation  can  be  of  different  types  (cameras, 
LIDARs,  proximity  sensors,  other).  The  corresponding  output  values  arc  then  heterogeneous, 
and  no  single  space  is  likely  to  fit  both  types  of  outputs  naturally.  Second,  the  time  elapsed 
between  the  two  measurements  is  an  important  source  of  information  for  computing  the  asso¬ 
ciation  value  between  them.  This  computation  must  consider  the  distance  between  the  sensors, 
estimates  of  travel  speeds,  and  the  presence  of  possible  delays  (in  an  airport,  delays  could  come 
from  stores,  restaurants,  security  lines,  ...)  or  accelerators  (moving  walkways,  escalators,  ...) 
between  the  two  measurement  stations.  These  considerations  cannot  be  captured  by  either 
measurement  alone,  but  can  be  incorporated  in  the  association  between  sensor  measurements 
at  different  stations. 

Because  of  these  reasons,  associations  are  more  flexible  and  potentially  richer  than  sepa¬ 
rate  measurement  features.  Of  course,  associations  can  be  computed  from  metrics  defined  in  a 
feature  space  whenever  the  situation  warrants  -  that  is,  when  measurements  happen  to  be  ho¬ 
mogeneous.  Thus,  associations  subsume  the  standard  approach,  and  provide  a  representational 
foundation  for  a  broader  set  of  circumstances. 

Both  the  number  n  of  measurements  and  the  number  m  of  association  values  are  growing 
functions  of  time  t.  The  n(t)  measurements  can  be  represented  as  the  set  V(t)  of  nodes  in 
a  growing,  weighted  association  graph  C(t)  =  (V(t),  E(t),  P(t)).  Two  measurements  arc 
connected  by  an  edge  in  E(t)  if  an  association  value  pV)  £  P(t)  is  available  for  them,  and  ptj 
is  the  weight  for  that  edge.1  The  graph  G{t)  is  generally  not  complete,  as  it  does  not  always 

'For  convenience,  the  reflexive  property  is  ignored  throughout  this  proposal.  In  other  words,  a  measurement  is  not 
associated  with  -  or  considered  equivalent  to  -  itself. 
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make  sense  to  establish  associations  for  two  measurements.  For  instance,  two  sensors  may 
be  cameras  that  look  at  people  from  different  directions  (perhaps  from  the  front  and  from  the 
back),  or  across  excessively  long  time  intervals,  and  this  may  make  matching  between  these 
views  meaningless. 

Bayesian  estimation  is  made  possible  by  the  second,  key  contribution  made  in  our  work: 
A  method  for  defining  a  probability  distribution  over  association  graphs  -  and  therefore  equiv¬ 
alence  graphs  as  well.  This  is  a  new  definition,  significantly  different  from  what  was  in  our 
original  proposal.  Briefly,  and  following  the  spirit  of  the  idea  that  Mallows  [6]  introduced  for 
permutations,  we  define  a  measure  of  compatibility  between  a  partition  graph  F  =  (V,  E,  P ) 
and  and  association  graph  H  =  (V,  F.  Q )  on  the  same  set  of  nodes  V.  Recall  that  a  partition 
graph  is  equivalent  to  a  partition  of  its  nodes,  so  that  E  is  the  complete  set  of  edges,  P  is  binary 
(that  is,  pij  is  either  0  or  1),  and  specifies  a  set  of  disjoint  cliques.  The  compatibility  between 
T  and  H  is  defined  as  follows: 


d(Y,H)  =  \Pij  ~  «il  • 

EnF 


(1) 


This  is  not  a  distance.  For  instance,  if  the  edge  set  F  in  FF  is  empty,  then  d(T,  H)  =  0 
regardless  of  F.  More  generally,  this  measure  of  compatibility  does  not  penalize  unspecified 
edges  in  the  association  graph  H.  As  a  consequence,  ignorance  as  to  whether  two  observations 
do  or  do  not  correspond  to  the  same  identity  is  compatible  with  any  assignment  of  identities  to 
observations. 

Ignorance,  on  the  other  hand,  is  the  basis  for  the  following  definition  of  dispersion,  a 
measure  of  the  uncertainty  implied  by  an  association  graph  H  =  ( V'.  F,  Q): 


*  =  £  1  -  Qij)  +  \ 
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~\F\ 


(2) 


The  first  term  in  this  measure  penalizes  uncertain  associations,  that  is  values  of  qij  that  arc  dif¬ 
ferent  from  either  0  or  1.  The  second  term  penalizes  missing  associations,  since  the  expression 
in  square  brackets  is  the  number  of  edges  that  arc  in  F  and  not  in  the  complete  graph  on  V. 
Each  such  missing  association  receives  a  penalty  of  1/2.  The  maximum  penalty  for  uncertain 
associations  is  also  1/2,  so  that  the  maximum  possible  dispersion  for  an  association  graph  with 
\V\  nodes  is 


and  is  incurred  by  the  empty  association  graph. 

Then,  given  a  distinguished  set  of  C  equivalence  graphs  ri,...,rc  and  C  dispersion 
parameters  ay, . . . ,  ac,  we  define  the  following  parametric  probability  measure  on  the  set  A 
of  association  graphs: 


/  ,  ,  ,  d(G,rc) 

p(G)  =  e~^a^  e - ~ 

c=  1 


where  'i/(cr,  7)  =  log 


^ — -v  v  -v  _  d(Gf  ,rc) 

2^  2^e  CTc 

G'eA  c— 1 


(3) 


is  the  cumulant  function  and 

cr  =  (ai, . . .  ,ac)T  and  7  =  (Fi, .  •  • ,  Tc)T  ■ 
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This  probability  measure  on  the  set  of  association  graphs  provides  a  conceptual  founda¬ 
tion  for  learning  and  inference  in  the  context  of  identity  management,  based  on  the  general 
principles  of  Sequential  Importance  Sampling  for  Bayesian  filtering  [1], 

Specifically,  given  a  sequence  y(to),y(ti), ...  of  measurements  over  time,  in  the  form  of 
association  values  pij,  the  current  belief  about  the  true  state  x(t)  -  a  particular  equivalence 
graph  -  is  captured  by  a  probability  distribution  of  the  form  (3)  over  the  space  of  equivalence 
graphs.  A  measurement  model,  that  is,  a  conditional  probability  distribution  p(y(t)  |  x(t)), 
captures  what  is  known  about  sensing:  If  the  current  state  x(t)  is  known,  the  current  measure¬ 
ment  y(t )  is  assumed  to  be  independent  of  past  and  future  states,  and  the  measurement  model 
is  a  unimodal  version  ( C  =  1)  of  a  distribution  of  the  form  (3)  on  association  graphs,  with 
an  equivalence  graph  forming  the  reference  graph  x(t)  =  Ti.  Finally,  the  state  variables  arc 
assumed  to  form  a  Markov  sequence,  so  that  their  interdependence  is  entirely  captured  by  the 
transition  model  p(x(t  +  At)  |  x(t)),  again  of  the  form  (3),  but  on  equivalence  graphs  only. 

Given  these  models  and  an  initial  estimate  p(x(0)  |  y  (0) ),  Bayesian  filtering  then  estimates 
for  any  desired  time  t  the  posterior  probability  distribution  p(X(f)  |  Y  ( t )),  where 

X(()  =  {x(r)  |  0  <  t  <  t}  and  Y (t)  =  {y(r)  |  0  <  r  <  t} 

are  the  accumulated  histories  of  states  and  measurements.  This  posterior  distribution  can  be 
used,  when  desired,  to  obtain  MAP  point  estimates  of  the  identities  associated  with  the  obser¬ 
vations: 

X(t)  =  argma xp(X(f)  I  Y(f))  . 

x(t) 

Bayesian  filtering  was  outlined  in  the  proposal  for  this  grant,  and  is  essentially  unaffected  by 
the  changes  of  formulation  described  so  far.  Because  of  this,  this  method  is  summarized  in 
Appendix  A  for  completeness. 

2.2  Batch  Solution 

The  crucial  computation  in  the  proposed  estimation  procedure  is  initialization  -  or  batch  solu¬ 
tion  -  that  is,  the  computation  of  the  posterior  probability  p(X(0)  |  Y(0))  =  p(x(0)  |  y(0)). 
Here,  y(0)  is  a  set  of  association  values  ptj  between  observations,  and  x(0)  is  a  partition  of 
the  set  of  observations  into  distinct  identities. 

This  computation  can  be  viewed  as  a  batch  (that  is,  non-recursive)  version  of  stochastic 
estimation,  in  which  the  probability  distribution  over  possible  identities  is  estimated  from  a 
fixed  set  of  observations  (association  values).  If  the  number  of  observations  is  relatively  small, 
and  all  data  is  available  ahead  of  time,  initialization  solves  the  whole  data  association  problem. 
For  data  sets  that  grow  indefinitely  over  time,  propagation  and  update  arc  necessary  in  addition, 
as  outlined  in  Appendix  A. 

The  posterior  p(x(0)  |  y(0))  can  be  computed  by  first  determining  high- likelihood  esti¬ 
mates  x(0)  of  x(0)  from  y(0).  We  can  then  use  a  Mallows-like  expression  of  the  form  (3) 
with  the  estimates  playing  the  role  of  the  Tc  equivalence  graphs. 

The  computation  of  likely  graph  partitions  x(0)  from  a  graph  y(0)  of  association  values 
has  been  viewed  in  the  literature  in  different  but  related  ways: 

•  A  partition  of  the  node  set  V  (t)  can  be  represented  by  an  equivalence  graph  made  of 
disjoint  cliques:  Two  nodes  are  equivalent  if  and  only  if  they  are  in  the  same  clique. 
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Identity  management  can  then  be  cast  as  the  problem  of  finding  the  equivalence  graph 
closest  to  G(t),  in  a  metric  to  be  specified.  Formally,  equivalence  graphs  are  special 
cases  of  association  graphs,  namely,  those  made  of  disjoint  cliques  and  with  binary  edge 
weights. 

•  The  same  partition  of  V (t)  can  be  viewed  as  a  clustering  of  V it)  in  a  metric  specified 
partially  by  the  association  weights  in  P(t).  Inference  management  then  optimizes  a 
ratio  between  intra-cluster  and  inter-cluster  spread,  suitably  defined. 

•  If  the  number  k  of  sets  in  the  clustering  or  partition  of  V ( t )  were  known,  the  necessary 
computation  could  be  restated  as  a  graph-cut  problem:  Find  a  minimum-weight  subset 
of  E(t)  whose  removal  separates  the  graph  into  k  connected  components.  Inference 
management  can  then  be  phrased  as  a  joint  estimation  of  k  and  the  corresponding  optimal 
k- cut. 

These  approaches  differ  by  what  is  being  optimized:  a  distance  between  graphs;  a  spread 
ratio;  or  the  cost  of  a  cut.  They  correspond  to  different  approaches  and  algorithms  in  the 
literature.  However,  all  these  algorithms  are  designed  to  work  with  -  or  imply  -  additive 
measures  for  graph  cuts,  in  that  they  use  the  sum  of  association  values  along  edges  to  determine 
whether  a  set  of  edges  should  be  removed. 

To  experiment  with  these  approaches,  we  developed  a  software  infrastructure  with  the 
elements  described  in  the  next  Section.  Section  2.4  then  describes  a  key  problem  common 
to  approaches  that  estimate  x(0)  by  additive  measures  of  graph  cuts,  and  Section  2.5  shows 
our  solution  to  this  problem. 

2.3  Experimental  Setup 

The  main  question  addressed  by  our  experiments  is  the  extent  to  which  the  initial  state  estimate 
p(X(0)  |  Y (0))  -  henceforth  abbreviated  to  p(X  |  Y)  -  reflects  ground  truth,  as  determined 
by  simulation,  with  varying  amounts  of  uncertainty  in  the  input  association  graph. 

Answering  this  question  required  the  programming  of  the  following  modules: 

•  A  simulation  module  that  creates  a  partition  graph  F  and  perturbs  it  in  a  controlled  way  to 
produce  an  association  graph  G.  Perturbations  include  modification  of  the  binary  values 
on  the  edges  of  T  into  association  values  in  the  interval  [0, 1]  for  the  edges  of  G.  They 
also  include  removal  of  a  specified  number  of  edges  from  G,  to  simulate  unspecified 
edges. 

•  The  SV  module  itself,  which  takes  an  association  graph  G  and  an  integer  k  as  inputs 
and  produces  a  set  of  edges  of  minimum  weight  whose  removal  leaves  k  connected 
components. 

•  A  clustering  module  that  takes  an  association  graph  G,  fixed  integers  C  and  N  with 
N  »  C,  and  a  real  number  r  with  0  <  r  <  1.  This  module  generates  N  random  values 
of  A;  by  a  Dirichlet  process.  For  each  value  of  k,  the  clustering  module  draws  a  fraction  r 
of  edges  from  G,  and  runs  a  graph  partition  estimation  algorithm  on  the  resulting  graph 
and  with  parameter  k.  This  produces  a  sequence  Fi,...,rjv  of  association  graphs  which 
arc  then  clustered  with  the  C'-means  algorithm.  The  output  from  this  modules  is  the  set  of 
resulting  cluster  centers  Ti , . . . ,  Yc,  together  with  the  dispersion  parameters  a±, ...  ,ac 
computed  through  equation  (2). 
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Figure  1 :  Algorithms  that  measure  the  cost  of  a  cut  by  the  sum  of  its  edge  weights  will  not  remove  the  weak,  spurious 
edges  ei,  c-i  between  components  A  and  B  of  this  graph  until  after  “dangling”  connections  like  e%,  e.4  are  severed. 


•  An  evaluation  module  that  determines  the  likelihood  of  association  graph  G  by  comput¬ 
ing  expression  (3). 

•  A  display  module  that  shows  statistics  of  the  results  as  a  function  of  the  structure  of  G 
and  the  extent  of  its  perturbations.  Likelihood  is  expected  to  be  a  decreasing  function  of 
the  number  of  removed  edges  and  the  extent  of  perturbation  of  the  remaining  association 
values. 


2.4  The  Additivity  Problem 

After  implementing  several  of  them  in  Matlab,  and  testing  them  in  simulation,  we  realized  that 
this  additivity  leads  to  brittleness  in  the  presence  of  spurious  association  values  pij.  Figure  1 
illustrates  this  problem.  In  this  Figure,  two  strong  connected  components  A,  B  of  a  graph  are 
tied  to  each  other  by  two  edges  e\,  e2  with  relatively  low  association  values.  Any  algorithm 
that  is  based  on  additive  cut  measures  will  end  up  preserving  this  weak  connection  at  least  until 
after  “dangling”  edges  such  as  e3,  e 4  are  severed. 

The  algorithm  by  Saran  and  Vazirani  [8]  to  find  a  fc-cut  of  graph  that  is  a  factor  2  —  2//;:  of 
minimal  may  serve  as  an  illustration.  This  method  requires  specifying  the  number  k  of  identi¬ 
ties  (eventual  graph  components),  which  is  generally  unknown.  To  address  this  difficulty,  we 
embedded  the  A;-cut  computation  within  an  estimator  that  models  k  as  a  Dirichlet  process  [7] : 
We  draw  from  this  process  to  hypothesize  k,  and  we  solve  the  corresponding  k- cut  problem, 
as  illustrated  in  Section  2.3. 

In  our  experiments,  this  algorithm  needed  large  values  of  k  in  order  to  sever  weak  con¬ 
nections  with  a  few  edges  (as  exemplified  by  edges  e\,  e2  in  Figure  1)  if  there  exist  several, 
somewhat  stronger  connections  with  individual,  otherwise  isolated  nodes  (as  exemplified  by 
edges  e3,  e\  in  the  Figure).  At  these  high  values  of  k,  the  likelihood  that  also  unwanted  cuts 
are  made  becomes  large,  and  estimated  partitions  become  meaningless.  We  call  this  issue  the 
additivity  problem. 
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2.5  The  Partition  Filtration 


To  address  the  additivity  problem,  we  developed  a  method  that  is  based  on  the  concept  of  a 
partition  filtration.  This  concept  leads  to  straightforward  estimates  of  high-likelihood  cuts. 

A  partition  T2  is  said  to  be  a  refinement  of  partition  of  the  same  set  of  nodes, 


Ti  A  r2  , 

if  every  set  of  T2  is  contained  in  (or  possibly  coincident  with)  a  set  of  Ti. 

Given  an  association  graph  G  with  vertex  set  V  and  association  values  (edge  weights)  p,:j 
between  nodes  i  and  j,  form  the  complete  graph  G  by  replacing  missing  edges  with  zero- 
weight  edges.  Then,  if  ir  is  a  real  number  between  0  and  1,  form  the  partition  T(G,  n)  as  the 
set  of  connected  components  of  the  binary  graph  with  vertex  set  V  and  with  an  edge  between 
nodes  i  and  j  if  and  only  if 

Pij  >  7T  . 

In  other  words,  keep  the  edges  of  G  that  have  weight  at  least  ir,  and  compute  the  corresponding 
connected  components. 

Then,  the  parametric  family  of  graphs  \'(G .  7 r)  is  &  filtration,  in  the  sense  that 

0  <  7Tl  <  7T2  <  1  =>-  T(G,  7Tl)  A  T (G,  7 r2)  . 

In  this  filtration,  \  (G,  0)  is  the  complete  graph  on  V  and  Y{G ,  1)  is  the  trivial  graph  where 
every  node  in  V  is  a  separate  component  (empty  graph). 

Then,  we  fix  a  real  parameter  e  between  0  and  1/2,  and  define  the  e-set  of  reference  par¬ 
titions  Ti, . . . ,  Tp  in  equation  (3)  to  be  the  set  of  all  partitions  in  the  family  T(G,  tt)  such 
that 

1  1 

- e  <  7 r  <  — he. 

2  ~  ~  2 

With  this  definition,  the  computation  of  equation  (3),  the  crucial  component  of  the  identity 
management  problem,  has  become  very  efficient. 

The  parameter  e  determines  how  far  one  is  willing  to  go  from  the  standard  partition 
V{G,  1/2),  which  essentially  takes  the  association  probabilities  pV]  at  their  face  values.  Note 
that  missing  associations  arc  treated  as  evidence  of  no  association  in  this  context,  that  is,  they 
are  made  equivalent  to  pVJ  =  0.  This  reflects  the  conservative  stance  whereby  identities  arc 
equated  only  in  the  presence  of  positive  evidence. 

The  first  panel  in  Figure  2  shows  a  random  association  graph.  Blue  edges  have  weights 
significantly  higher  than  1/2,  and  red  edges  have  weights  significantly  lower  than  1/2.  Thus, 
edges  in  this  graph  simulate  observations  that  have  good  confidence,  that  is,  whose  association 
values  arc  far  from  1/2.  The  remaining  panels  in  the  figure  show  all  the  partitions  in  the 
complete  filtration,  obtained  when  e  =  1/2.  Each  class  in  a  partition  is  shown  by  its  clique, 
and  the  range  of  7r  values  for  which  each  partition  is  valid  is  shown  at  the  top  of  each  graph.  In 
terms  of  a  partition  distance  [5],  which  measures  how  many  nodes  have  to  be  moved  between 
two  partitions  to  make  them  equal,  the  partitions  in  the  filtration  arc  very  different  from  each 
other.  In  terms  of  the  compatibility  function  defined  in  equation  1,  on  the  other  hand,  the 
partitions  for  small  values  of  e  arc  very  compatible  with  each  other,  because  they  differ  by  a 
small  number  of  edges.  Thus,  filtration  partitions  and  compatibility  work  well  together  in  the 
definition  (3)  for  the  probability  distribution  over  association  graphs.  First,  graphs  that  differ 
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by  a  small  number  of  edges  arc  mutually  compatible.  Second,  the  partitions  in  a  filtration  for 
a  given  association  graph  provide  a  set  of  reference  graphs  that  account  well  for  uncertainty 
in  the  association  values.  Third,  the  extent  of  the  range  of  7r  for  a  partition  can  be  used  as  a 
measure  of  the  persistence  of  that  partition  to  changes  in  the  threshold  7 r.  For  instance,  the 
partition  with  highest  persistence  in  Figure  2  is  the  left  graph  in  the  third  row  (boxed),  with  a 
value  of  persistence  equal  to  0.87  —  0.27  =  0.6. 

In  other  words,  the  partition  filtration  is  an  efficient  solution  to  the  batch  version  of  identity 
management,  and  can  therefore  be  used  for  initialization  in  the  online  version. 
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Figure  2:  Top  left:  An  observation  graph.  Blue  edges  correspond  to  strong  associations,  red  edges 
to  weak  ones.  Other  panels:  Partitions  in  the  qpmplete  filtration  for  the  graph  at  top  left.  The 
ranges  of  tt  that  result  in  each  partition  are  shown  above  each  partition.  The  boxed  partition  is  the 
most  persistent  one. 


A  Bayesian  Filtering 


Simple  manipulation  [3]  shows  that  the  following  recursive  formula  holds  for  the  posterior 
probability  distribution  p(X(f)  |  Y (t)): 

fvn,  a  *m  vn  ,  a  rvn\  i  vuw  P(y(*  +  Af)  I  x(*  +  At))  f(x(*  +  At )  I  x(*)) 

p(X(t+Af)  |  Y(t+At))  =  p(X(t)  |  Y(()) - p(y(t  +  At)|Y(t)) - ' 

(4) 

On  the  right-hand  side,  p(X(f)  |  Y (£))  is  the  old  posterior,  for  which  an  initial  value  is  known 
(p(X(0)  |  Y(0))),  while  p(y(t  +  At)  |  x(f  +  At))  is  the  measurement  model  and  p(x(f  + 
At)  |  x(t))  is  the  transition  model. 

The  remaining  normalization  term,  p(y(t  +  At)  |  Y (£)),  is  in  general  difficult  to  compute 
analytically.  This  suggests  [3]  using  a  sampling  approach,  because  Monte  Carlo  sampling 
does  not  require  knowing  the  normalization  factor.  Instead  of  computing  the  right-hand  side  of 
equation  (4),  one  can  thus  sample  from  it,  thereby  representing  the  distribution  on  the  left-hand 
side  though  a  collection  of  samples.  This  approach  in  turn  requires  the  ability  to  draw  samples 
from  the  old  posterior  p(X(t)  |  Y (t)),  and  to  evaluate  both  transition  model  and  measurement 
model  p(y(t)  |  x(t))  pointwise. 

Sampling  from  the  old  posterior  p(X(t)  |  Y (t))  directly  is  also  difficult.  Recursive  impor¬ 
tance  sampling  [1]  circumvents  this  difficulty  by  sampling  instead  from  a  simpler  distribution 
7r(X(t)  |  Y  (£)),  called  the  importance  function,  whose  support  includes  that  of  the  old  poste¬ 
rior,  and  which  is  designed  to  have  the  following  form  to  enable  recursive  computation: 

vr(X(f)  |  Y (£))  =  7r(x(0)  |  y (0))  7r(x(ife)  |  x(f0),  •  •  • ,  x(4_i),  Y (tk))  .  (5) 

tk<t 


It  can  then  be  shown  [1]  that  the  following  Sequential  Importance  Sampling  (SIS)  frame- 

(i) 

work  yields  posterior  weighted  samples  x,  with  weights  w,  at  time  tk  for  k  =  0, 1, . . .: 

•  For  i  =  1, . . . ,  N,  sample  xj^  ~  7r(x  |  x^, . . . ,  x^x,  Y ( tk )). 

•  For  i  =  I .....  Ar,  evaluate  the  unnormalized  importance  weights: 


(i)  F(y(4)  I  x^)p(x^  |  x^J 

uk-i  ,  (i)  ,  (i)  (i)  v/.  n 


•  For  i  =  1 .....  iY,  normalize  the  importance  weights: 


U)  ' 

k 


Because  of  the  mixture  form  (3)  of  the  conditional  probability  density  p(x  |  y),  initial¬ 
ization  amounts  to  estimating  the  number  C  of  mixture  components,  the  C  reference  equiva¬ 
lence  graphs  in  the  vector  7  =  (Ti  and  the  C  dispersion  parameters  in  the  vector 

a  =  (<7i, . . . ,  <jc)T ■ 

Sequential  importance  sampling  requires  defining  an  importance  function  of  the  form  (5) 
from  which  is  it  easy  to  sample.  While  the  requirements  on  this  function  arc  very  mild  as  far 
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as  statistical  convergence  is  concerned,  efficiency  demands  that  the  function  approximates  the 
true  posterior  well.  In  the  context  of  identity  management,  this  means  that  the  term 

tt(x(4)  I  x(f0),. .  .  ,x(4_i),  Y(tfe)) 

that  appears  in  equation  (5)  be  a  plausible  probabilistic  description  of  what  new  edges  appear-  in 
the  equivalence  graph  that  describes  estimated  identities  as  new  observations  become  available. 

We  propose  to  characterize  this  distribution  by  a  random  walk  in  the  space  of  equivalence 
graphs.  A  walk  must  move  from  graph  to  graph,  and  steps  are  easily  generated:  the  start 
equivalence  graph  is  merely  a  partition  of  its  nodes,  and  a  new  partition  can  be  generated  by 
moving  random  nodes  between  random  classes  of  the  partition. 

In  order  to  condition  the  resulting  walk  on  the  observations  y  (t),  we  use  the  Metropolis 
idea  [2]:  evaluate  the  likelihood  ratio  r  =  p(u')/p(uj),  where  u  is  the  old  partition  and  u/  is 
the  new  one.  If  r  is  greater  than  one,  accept  the  new  step.  Otherwise,  accept  it  with  probability 
r,  and  reject  it  with  probability  1  —  r. 

So  far-,  it  has  been  assumed  that  the  set  Y  (t)  of  measurements  grows  over  time,  and  the 
state  X(t)  grows  with  it.  An  efficient  recursive  procedure,  on  the  other  hand,  relies  on  X(t) 
being  bounded  in  size.  This  is  achieved  by  forgetting  old  observations.  “Old”  here  can  be 
measured  in  a  principled  fashion  by  referring  to  two  related  probability  distributions.  The 
first  is  the  probability  pju  that  a  target  observed  at  sensor  i  subsequently  appears  at  sensor  j 
first.  The  second  is  the  conditional  probability  nKI  (t )  that  such  a  target  arrives  at  j  after  a  time 
interval  t..  Then,  given  a  small  probability  e,  the  observation  is  removed  if 

max  [  dr  <  e  . 

Pj\i>eJT>t  Pj\i 

In  words,  an  observation  is  forgotten  when  it  is  so  old  that  the  probability  that  its  target  has  not 
yet  reappeared  elsewhere  is  negligible. 
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